Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision
نویسندگان
چکیده
We propose a CNN-based approach for 3D human body pose estimation from single RGB images, that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly available 3D pose data. We propose novel CNN supervision techniques, using a regularization structure while training that extends the concept of multi-level skip connections, and leverage first and second order parent relationships along the skeletal kinematic tree to learn better representations. We introduce a new training set for human body pose estimation from monocular images of real humans, that has the ground truth captured with a multi-camera marker-less motion capture system. It complements existing corpora with greater diversity in pose, human appearance, clothing, occlusion, and viewpoints, and enables an increased scope of augmentation. We also contribute a new benchmark that covers outdoor and indoor scenes. We further combine it with transfer learning from 2D pose human pose prediction to achieve even better generalization, and improve over the state-of-the-art on standard benchmarks by more than 25%. We argue that the use of transfer learning of representations in tandem with algorithmic and data contributions is crucial for general progress along many different dimensions of the problem.
منابع مشابه
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based...
متن کاملUsing a single RGB frame for real time 3D hand pose estimation in the wild
We present a method for the real-time estimation of the full 3D pose of one or more human hands using a single commodity RGB camera. Recent work in the area has displayed impressive progress using RGBD input. However, since the introduction of RGBD sensors, there has been little progress for the case of monocular color input. We capitalize on the latest advancements of deep learning, combining ...
متن کاملCamera Pose Estimation in Unknown Environments using a Sequence of Wide-Baseline Monocular Images
In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4...
متن کاملHuman Context: Modeling Human-Human Interactions for Monocular 3D Pose Estimation
Automatic recovery of 3d pose of multiple interacting subjects from unconstrained monocular image sequence is a challenging and largely unaddressed problem. We observe, however, that by tacking the interactions explicitly into account, treating individual subjects as mutual “context” for one another, performance on this challenging problem can be improved. Building on this observation, in this ...
متن کاملAdvancing human pose and gesture recognition
This thesis presents new methods in two closely related areas of computer vision: human pose estimation, and gesture recognition in videos. In human pose estimation, we show that random forests can be used to estimate human pose in monocular videos. To this end, we propose a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses...
متن کامل